Search results for "Document processing"

showing 2 items of 2 documents

Semi-automated annotation of page-based documents within the Genre and Multimodality framework

2016

This paper describes ongoing work on a tool developed for annotating document images for their multimodal features and compiling this information into a corpus. The tool leverages open source computer vision and natural language processing libraries to describe the content and structure of multimodal documents and to generate multiple layers of XML annotation. The paper introduces the annotation schema, describes the document processing pipeline and concludes with a brief description of future work.

060201 languages & linguisticsStructure (mathematical logic)Information retrievalComputer sciencecomputer.internet_protocolbusiness.industry05 social sciences050801 communication & media studies06 humanities and the artsTemporal annotationcomputer.software_genreDocument processingPipeline (software)MultimodalityAnnotation0508 media and communicationsOpen source0602 languages and literatureComputingMethodologies_DOCUMENTANDTEXTPROCESSINGArtificial intelligencebusinesscomputerNatural language processingXMLProceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
researchProduct

Integration of a structural features-based preclassifier and a man-machine interactive classifier for a fast multi-stroke character recognition

2003

A transputer-based parallel machine for handwritten character recognition is proposed. An algorithm based on structural features and on a tree classifier was used to accomplish the pre-classification of the unknown sample in order to speed up the recognition process. The algorithm for the final classification is based on the description of the strokes through Fourier descriptors. The learning phase is accomplished through a man-machine interactive process. The proposed system can expand its knowledge base. A special representation of this knowledge base is proposed in order to record a great amount of data in a suitable way. A fast multistroke handwritten isolated character recognition syst…

Settore INF/01 - InformaticaComputer scienceIntelligent character recognitionbusiness.industrySketch recognitionPattern recognitionDocument processingIntelligent word recognitionComputingMethodologies_PATTERNRECOGNITIONFeature (machine learning)Artificial intelligencebusinessClassifier (UML)Man machine systems Character recognition Humans Handwriting recognition Pattern recognition Parallel machines System testing Performance evaluation Prototypes Energy management
researchProduct